Center Based Clustering: A Foundational Perspective

نویسنده

  • Pranjal Awasthi
چکیده

In the first part of this chapter we present existing work in center based clustering methods. In particular, we focus on the k-means and the k-median clustering which are two of the most widely used clustering objectives. We describe popular heuristics for these methods and theoretical guarantees associated with them. We also describe how to design worst case approximately optimal algorithms for these problems. In the second part of the chapter we describe recent work on how to improve on these worst case algorithms even further by using insights from the nature of real world clustering problems and data sets. Finally, we also summarize theoretical work on clustering data generated from mixture models such as a mixture of Gaussians. 1 Approximation algorithms for k-means and k-median One of the most popular approaches to clustering is to define an objective function over the data points and find a partitioning which achieves the optimal solution, or an approximately optimal solution to the given objective function. Common objective functions include center based objective functions such as k-median and k-means where one selects k center points and the clustering is obtained by assigning each data point to its closest center point. In k-median clustering the objective is to find center points c1, c2, · · · ck, and a partitioning of the data so as to minimize Φ = ∑ x mini d(x, ci). This objective is historically very useful and well studied for facility location problems [13, 40]. Similarly the objective in k-means is to minimize Φ = ∑ xmini d(x, ci) . The k-means objective function is exactly the loglikelihood of data coming from a mixture of Gaussians. Hence, optimizing this objective is closely related to fitting the maximum likelihood mixture model for a given dataset. For a given set of centers, the optimal clustering for that set is obtained by assigning each data point to its closest center point. This is known as the Voronoi partitioning of the data. Unfortunately optimizing both these objectives turns out to be NP -hard. Hence a lot of the work in the theoretical community focuses on designing good approximation algorithms for these problems [13, 9, 26, 31, 40, 44, 45, 53, 17] with formal guarantees on worst case instances, as well as providing better guarantees for nicer, stable instances. In this chapter we discuss several stepping stone results in these directions, focusing our attention on the kmeans objective. A lot of the the ideas and techniques mentioned apply in a straightforward manner to the k-median objective as well. We will point out crucial differences between the two objectives as and when they appear. We will additionally discuss several practical implications of these results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Meanings of Foundational Virtue in Islamic Mystical Ethics: A Case Study of Honesty

In mystical ethics, some virtues have a foundational role in relation to other virtues; that is, other virtues are in some ways dependent on, conditional to, or rooted in them. This is a gradational concept, and therefore one can speak of foundational and more foundational among foundational virtues in mysticism. Honesty is the most foundational virtue in mystical ethics, and other virtues are ...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Model the allocation of productive financial resources from the perspective of livelihood poverty indicators using a combination of clustering methods and SAW technique

Poverty is a social, economic, cultural and political reality that has long been one of the greatest human problems. The diversity of problems, needs and problems of the deprived and low-income groups of the society and the multiplicity of poverty indicators on the one hand, and on the other hand the lack of financial resources and credits to solve the poverty indicators, organizations in charg...

متن کامل

An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering

The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...

متن کامل

Multiway Spectral Clustering: A Margin-Based Perspective

Spectral clustering is a broad class of clustering procedures in which an intractable combinatorial optimization formulation of clustering is “relaxed” into a tractable eigenvector problem, and in which the relaxed solution is subsequently “rounded” into an approximate discrete solution to the original problem. In this paper we present a novel margin-based perspective on multiway spectral clust...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013